67 research outputs found
Identifying Unmaintained Projects in GitHub
Background: Open source software has an increasing importance in modern
software development. However, there is also a growing concern on the
sustainability of such projects, which are usually managed by a small number of
developers, frequently working as volunteers. Aims: In this paper, we propose
an approach to identify GitHub projects that are not actively maintained. Our
goal is to alert users about the risks of using these projects and possibly
motivate other developers to assume the maintenance of the projects. Method: We
train machine learning models to identify unmaintained or sparsely maintained
projects, based on a set of features about project activity (commits, forks,
issues, etc). We empirically validate the model with the best performance with
the principal developers of 129 GitHub projects. Results: The proposed machine
learning approach has a precision of 80%, based on the feedback of real open
source developers; and a recall of 96%. We also show that our approach can be
used to assess the risks of projects becoming unmaintained. Conclusions: The
model proposed in this paper can be used by open source users and developers to
identify GitHub projects that are not actively maintained anymore.Comment: Accepted at 12th International Symposium on Empirical Software
Engineering and Measurement (ESEM), 10 pages, 201
Characterizing and Predicting Blocking Bugs in Open Source Projects
Software engineering researchers have studied specific types of issues such reopened bugs, performance bugs, dormant bugs, etc. However, one special type of severe bugs is blocking bugs. Blocking bugs are software bugs that prevent other bugs from being fixed. These bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems. In this paper, we study blocking bugs in eight open source projects and propose a model to predict them early on. We extract 14 different factors (from the bug repositories) that are made available within 24 hours after the initial submission of the bug reports. Then, we build decision trees to predict whether a bug will be a blocking bugs or not. Our results show that our prediction models achieve F-measures of 21%-54%, which is a two-fold improvement over the baseline predictors. We also analyze the fixes of these blocking bugs to understand their negative impact. We find that fixing blocking bugs requires more lines of code to be touched compared to non-blocking bugs. In addition, our file-level analysis shows that files affected by blocking bugs are more negatively impacted in terms of cohesion, coupling complexity and size than files affected by non-blocking bugs
On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests
Pull-based development has enabled numerous volunteers to contribute to
open-source projects with fewer barriers. Nevertheless, a considerable amount
of pull requests (PRs) with valid contributions are abandoned by their
contributors, wasting the effort and time put in by both the contributors and
maintainers. To better understand the underlying dynamics of
contributor-abandoned PRs, we conduct a mixed-methods study using both
quantitative and qualitative methods. We curate a dataset consisting of 265,325
PRs including 4,450 abandoned ones from ten popular and mature GitHub projects
and measure 16 features characterizing PRs, contributors, review processes, and
projects. Using statistical and machine learning techniques, we find that
complex PRs, novice contributors, and lengthy reviews have a higher probability
of abandonment and the rate of PR abandonment fluctuates alongside the
projects' maturity or workload. To identify why contributors abandon their PRs,
we also manually examine a random sample of 354 abandoned PRs. We observe that
the most frequent abandonment reasons are related to the obstacles faced by
contributors, followed by the hurdles imposed by maintainers during the review
process. Finally, we survey the top core maintainers of the studied projects to
understand their perspectives on dealing with PR abandonment and on our
findings.Comment: Manuscript accepted for publication in ACM Transactions on Software
Engineering and Methodology (TOSEM
Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects
Pull Requests (PRs) that are neither progressed nor resolved clutter the list
of PRs, making it difficult for the maintainers to manage and prioritize
unresolved PRs. To automatically track, follow up, and close such inactive PRs,
Stale bot was introduced by GitHub. Despite its increasing adoption, there are
ongoing debates on whether using Stale bot alleviates or exacerbates the
problem of inactive PRs. To better understand if and how Stale bot helps
projects in their pull-based development workflow, we perform an empirical
study of 20 large and popular open-source projects. We find that Stale bot can
help deal with a backlog of unresolved PRs as the projects closed more PRs
within the first few months of adoption. Moreover, Stale bot can help improve
the efficiency of the PR review process as the projects reviewed PRs that ended
up merged and resolved PRs that ended up closed faster after the adoption.
However, Stale bot can also negatively affect the contributors as the projects
experienced a considerable decrease in their number of active contributors
after the adoption. Therefore, relying solely on Stale bot to deal with
inactive PRs may lead to decreased community engagement and an increased
probability of contributor abandonment.Comment: Manuscript submitted to ACM Transactions on Software Engineering and
Methodolog
Where to Go Now? Finding Alternatives for Declining Packages in the npm Ecosystem
Software ecosystems (e.g., npm, PyPI) are the backbone of modern software
developments. Developers add new packages to ecosystems every day to solve new
problems or provide alternative solutions, causing obsolete packages to decline
in their importance to the community. Packages in decline are reused less
overtime and may become less frequently maintained. Thus, developers usually
migrate their dependencies to better alternatives. Replacing packages in
decline with better alternatives requires time and effort by developers to
identify packages that need to be replaced, find the alternatives, asset
migration benefits, and finally, perform the migration.
This paper proposes an approach that automatically identifies packages that
need to be replaced and finds their alternatives supported with real-world
examples of open source projects performing the suggested migrations. At its
core, our approach relies on the dependency migration patterns performed in the
ecosystem to suggest migrations to other developers. We evaluated our approach
on the npm ecosystem and found that 96% of the suggested alternatives are
accurate. Furthermore, by surveying expert JavaScript developers, 67% of them
indicate that they will use our suggested alternative packages in their future
projects
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
The success of a Pull Request (PR) depends on the responsiveness of the
maintainers and the contributor during the review process. Being aware of the
expected waiting times can lead to better interactions and managed expectations
for both the maintainers and the contributor. In this paper, we propose a
machine-learning approach to predict the first response latency of the
maintainers following the submission of a PR, and the first response latency of
the contributor after receiving the first response from the maintainers. We
curate a dataset of 20 large and popular open-source projects on GitHub and
extract 21 features to characterize projects, contributors, PRs, and review
processes. Using these features, we then evaluate seven types of classifiers to
identify the best-performing models. We also perform permutation feature
importance and SHAP analyses to understand the importance and impact of
different features on the predicted response latencies. Our best-performing
models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for
maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors
compared to a no-skilled classifier across the projects. Our findings indicate
that PRs submitted earlier in the week, containing an average or slightly
above-average number of commits, and with concise descriptions are more likely
to receive faster first responses from the maintainers. Similarly, PRs with a
lower first response latency from maintainers, that received the first response
of maintainers earlier in the week, and containing an average or slightly
above-average number of commits tend to receive faster first responses from the
contributors. Additionally, contributors with a higher acceptance rate and a
history of timely responses in the project are likely to both obtain and
provide faster first responses.Comment: Manuscript submitted to IEEE Transactions on Software Engineering
(TSE
- …